Linear Regression with Many Controls of Limited Explanatory Power
نویسندگان
چکیده
We consider inference about a scalar coe¢ cient in a linear regression model. One previously considered approach to dealing with many controls imposes sparsity, that is it assumed known that nearly all control coe¢ cients are zero. We instead impose a bound on a weighted sum of squared control coe¢ cients, which is interpretable as a bound on the sample variation in the dependent variable induced by the controls. We develop a simple testing procedure that exploits this additional information in general heteroskedastic models. We also show that under asymptotics where the number of controls is a non-negligible fraction of the number of observations, and the bound is not too large, our suggested test comes close to being weighted average power maximizing in the Gaussian homoskedastic model. We compare our procedure to a sparsity-based approach in a Monte Carlo study and by revisiting the empirical relationship between crime and abortion. Keywords: high dimensional linear regression, limit of experiments, L2 bound, invariance to linear reparameterizations We thank participants at various workshops for useful comments and advice. Müller gratefully acknowledges nancial support from the National Science Foundation through grant SES-1627660.
منابع مشابه
tlm User’s Guide: Effects under linear, logistic and Poisson regression models with transformed variables
3 Illustrative examples 4 3.1 Linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.1 Log transformation in the response . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.1.2 Log transformation in the explanatory variable . . . . . . . . . . . . . . . . . . . . 9 3.1.3 Log transformation in both the response and the explanatory variable ...
متن کاملبهکارگیری متغیرهای پنهان در مدل رگرسیون لجستیک برای حذف اثر همخطی چندگانه در تحلیل برخی عوامل مرتبط با سرطان پستان
Background and Objectives: Logistic regression is one of the most widely used generalized linear models for analysis of the relationships between one or more explanatory variables and a categorical response. Strong correlations among explanatory variables (multicollinearity) reduce the efficiency of model to a considerable degree. In this study we used latent variables to reduce the effects of ...
متن کاملRobust Estimation in Linear Regression Model: the Density Power Divergence Approach
The minimum density power divergence method provides a robust estimate in the face of a situation where the dataset includes a number of outlier data. In this study, we introduce and use a robust minimum density power divergence estimator to estimate the parameters of the linear regression model and then with some numerical examples of linear regression model, we show the robustness of this est...
متن کاملUsing Neural Networks with Limited Data to Estimate Manufacturing Cost
Neural networks were used to estimate the cost of jet engine components, specifically shafts and cases. The neural network process was compared with results produced by the current conventional cost estimation software and linear regression methods. Due to the complex nature of the parts and the limited amount of information available, data expansion techniques such as doubling-data and data-cr...
متن کاملSpatial Regression in the Presence of Misaligned data
In this paper, four approaches are presented to the problem of fitting a linear regression model in the presence of spatially misaligned data. These approaches are plug-in method, simulation, regression calibration and maximum likelihood. In the first two approaches, with modeling the correlation between the explanatory variable, prediction of explanatory variable is determined at sites...
متن کامل